Heat Conduction Process on Community Networks as a Recommendation Model 
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Using heat conduction mechanism on a social network we develop a systematic method to predict 
missing values as recommendations. This method can treat very large matrices that are typical 
of internet communities. In particular, with an innovative, exact formulation that accommodates 
arbitrary boundary condition, our method is easy to use in real applications. The performance is 
assessed by comparing with traditional recommendation methods using real data. 
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With the advent of the internet, there sprout many 
web sites that enable large communities to aggregate 
and interact. For example livejournal.com allows its 3 
million members to share interests and life experiences; 
del.icio.us is a social bookmark service for people to share 
their findings on the World Wide Web. Thousands of 
such web sites are built by web entrepreneurs and ac- 
tivists for the public, and their number is growing ever 
faster. This brings about massive amount of accessible 
information, more than each individual is able or willing 
to process. Information search, filtering, and recommen- 
dation thus become indispensable in internet era. Ideally 
speaking, a good recommendation mechanism should be 
able to "guess" what a person may want to select based 
on what he or she already selected [l|, Many such 
mechanisms are in actual use (like www.amazon.com 
proposing its readers with new books), however, jury is 
still out as to what is the best model. For a review of 
current techniques, see Q. 

Based on the heat conduction (or diffusion) process, 
we propose a recommendation model capable of handling 
individualized boundary conditions (BC). To better ex- 
plain our model, we first illustrate using the friendship 
network of N people: each person (member) is a node, 
and a pair of nodes is connected by an edge provided 
they are mutual friends. The collection of these informa- 
tion forms the symmetric adjacency matrix A: element 
Aij = 1, or depending on whether people i and j are 
mutual friends (1) or not (0). Although it is possible to 
consider asymmetric connection, this generalization will 
not be studied here. To recommend friends to any in- 
dividual member, we first set (Dirichlet) BC: to set the 
values on the directly connected nodes as 1 and some re- 
mote nodes (will be further specified) as 0. Values on 
all other nodes arc treated as variables to be determined. 



These values can be interpreted as the probabilities that 
these nodes might be selected as friends. 

We now describe an efficient and effective strategy to 
solve the proposed heat conduction problem. From A, we 
first construct a propagator matrix P = D^ 1 A, where D 
is the diagonal degree matrix. Denote H as the temper- 
ature vector of N components: the source-components 
are high temperature nodes with temperature 1 ; the sink- 
components are low temperature nodes with temperature 
0. Our task is to find, through thermal equilibrium, the 
temperatures associated with the remaining nodes that 
are neither sinks nor sources. The discrete Laplace oper- 
ator, analog of —V 2 , on this network is L = I — P, where 
I is the identity matrix. We only need to solve 



LH = f 



(1) 



where / is the external flux vector. Note that this is the 
discrete analog of — nV 2 T(r) = V • J(r) with H(i) plays 
the role of nT(r) and f(i) plays the role of V • J(r). 

Because Laplace operator conserves total heat and 
tend to spread heat from high temperature region to low 
temperature region, the only way to maintain the fixed 
temperature values at the sources and sinks is to apply 
external heat flux (inflow at sources and outflow at sinks). 
For the rest of the nodes, the equilibrium condition de- 
mands that no net heat flux should occur. Therefore, 
the only allowed nonzero components of / are source- 
and sink-components. 

The computation of the temperature vector is straight- 
forward. It is convenient to group the source and sink 
components together into a block Hi, and the rest free 
variables another block H 2 - That is 



H = 



Hi 

H 2 



(2) 



Likewise, we group the Laplace operator in a similar 
fashion and eq. |T]) may be expressed as 
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All we need to solve is the homogeneous equation 
L21H1 + L22H2 = , 



(4) 



without the need to know /. Fixing the values of Hi, H2 
can be readily found using standard iterative methods [J . 
The above approach, although straightforward, repre- 
sents a daunting challenge: for each individual, we must 
solve the huge matrix problem once - a prohibitively ex- 
pensive task for a typical internet community having mil- 
lions of members. 

The standard way to get around this dilemma is to 
resort to the Green's function method. Starting from 
cq.([T]) we would like to have a Green's function fl' such 
that eq.(p} can be inverted: 



Hi 

H, 



= n' 



(5) 



to get H 2 = fi 21 0' n H i. However, fl' = L^ 1 = 
(I — P)^ 1 is divergent: the Laplace operator has a zero 
eigenvalue and the inverse L~ l is meaningful only if 
{Hi,H2) T is in the subspace that is orthogonal to the 
eigenvector of zero eigenvalue. A fortunate scenario like 
this has occurred in the studies of random resistor net- 
works @,||. 

To simultaneously deal with all possible BC, we lose 
the freedom to limit the solution to a certain subspace. 
Nevertheless, we have a good understanding regarding 
this divergence. Basically, the P matrix has an eigenvalue 
one with the right eigenvector being a column of Is 

|u ) = (l,l,..-,lf 



and with left eigenvector being 

/ 01 ( di d% djv 



where di denotes the degree of node i and d = di 
being the sum of degrees. Note that with this notation, 



we have 



= 1. 



We may then decompose P into 

P = Q+\u°)(v°\ 

with Q\u°)(v°\ = \u°){v°\Q = 0. Further, the spectral 
radius of Q is now guaranteed to be smaller than 1 and 
thus (7 - Q) is invertible with (I - Q)- 1 = ££L Q". 
We may then rewrite the eq.© as 



(i-Q) 



Hi 
H2 



\u°)(v°\ 
c(H)\u°) 



Hi 
Hi 



(6) 



where the if-dependent constant may be written as 
c{H) = (vi\Hi) + (t^l^)- We need to explain the nota- 
tion further. Basically \u1), represents a column vector 
whose components are obtained from the column vector 



\u°) with component labels corresponding to that of the 
sources and the sinks. On the other hand, lu®) repre- 
sents a column vector that is the remainder of \u°) after 
removing the components whose labels correspond to the 
sources and sinks. Similarly, we define (vi \ to be a row 
vector whose components are obtained from the row vec- 
tor (v°\ with component labels corresponding to that of 
the sources and the sinks; while {v 2 \ represents a row 
vector that is the remainder of (u | after removing the 
components whose labels correspond to the sources and 
sinks. To simplify the notation, we will represent c(H) 
by c without explicitly showing its H dependence. 

Note that since Q\u°) = 0, upon multiplying fl = (I — 
Q)" 1 to both side of eq.© we have 



Hi 
Ho 



or equivalently 



Hi 
H 2 
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(7) 



(8) 



Consequently, we may write H2 in the following form 

l#2> =c\ul) + n 2 in^\Hi) -cfiaifi^K) ■ (9) 
Using the definition that c = (vi\Hi)-\-(v2 W2), we obtain 
c = (v i\Hi)+(v^\Q 2 ia^\Hi)+c [(v°2\u°2) - (v°\n 2 iQ^\u°i)] 



or equivalently 



1- [(V°2\U°2) ~ (V^l^lul)] 



(10) 



Substituting this result back to eq. @ , we obtain H2 with 
computational complexity solely depending on fulfill 1 - 
Note that we only needs to invert the matrix (I — Q) 
once and for all. Upon specifying the boundary nodes, 
one needs to reshuffle the rows and columns of the matrix 
as well as vectors - a relatively efficient operation. This 
operation groups the source nodes and sink nodes in one 
block to make easy the computation of Jl^ . 

Let us emphasize that our final expression is writ- 
ten in a rather general setting that it can be applied 
to cases when P is either row-normalized or column- 
normalized. In the case of column-normalized P 



we 



will have 



) = ((v° 



row norm 



|) T and 



(|u° ow norm .)) T - The solution structures (| MTU|) . however, 
does not change. 

Although an exact Green's function method with 
Dirichlct boundary condition using spectral analysis 
(eigenvalues and eigenvectors) has been established by 
Chung and Yau [J, we find our method more conve- 
nient for computational purpose. With our method, the 
Greens function Q is computed once and can be used for 
all different BC. This is immensely more efficient than 
finding all the eigenvalues and eigenvectors for every BC 
needed for each individual. Furthermore, it would not be 
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FIG. 1: Comparison between the exact solution (bold line) 
egs. (| 91100 and our approximation. For both cases we plot 
the "hottest" nodes. For better visualization we shifted the 
profiles such that the first node coincide in the graph. We 
observe a good agreement between the exact solution and the 
approximation for M — 10 in our artificial network. 

practical to find all the eigenvectors of matrices resulting 
from networks of millions of nodes. 

To apply our method, one may either choose to fully 
invert (/ — Q) or take its approximate form. The di- 
rect inversion of (/ — Q) may still be computationally 
challenging for a matrix of size millions by millions. In 
terms of approximations, we find the use of (/ — Q)^ 1 = 
limM-»oo f2(iV/) particularly useful, with 

O(M) = [I + P + ■ ■ ■ + P M - Af|tto)<«ol] . (11) 

This approximation gets better for larger M. This is be- 
cause the larger M is, the smaller the difference between 
P M and \u ){vo\- One may then use f2 2 i(M)fiiT 1 (M) 
in place of f^i^il 1 ■ The quality of this approximation 
may be be verified comparing the two models: the exact 
sorution(|9"l ll0[) versus the approximate one (ie. replacing 
^2i^il 1 by fi2i(M)rii]" 1 (M) in the exact solution). 

The convergence of the approximate solution to the ex- 
act solution (eqs.(f MTrJ|) ) was first tested on an artificially 
generated random network of 100 nodes. Aside from the 
condition that the nodes do not form disjoint clusters, 
a pair of nodes has probability p = 0.1 to be connected. 
One then randomly selects a sink node and a source node 
that are not directly linked. We expect to get very simi- 
lar shape of the temperature-profile as in the exact case. 
This is because for the row-normalized matrix, the \u°) 
vector being a column vector with 1 in each entry may 
induce a small but uniform offset in the approximate so- 
lution. In Fig. [TJ we plot the "temperature-profile" of 
the 15 hottest nodes from the exact solution and the 
"temperature-profile" of the same nodes using our ap- 
proximation solution of various M. A good agreement 
between the exact solution and the approximate solution 
is reached at about M = 10. 

To test the usability of our approach in real world, 
we use the movielens database. MovieLens (movie- 
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FIG. 2: Prediction performance on movielens database. The 
heat conduction model outperforms the mean predictor and 
the Pearson correlation based method as well. £ denotes the 
fraction of possible votes in the matrix. The vertical line, 
corresponding approximately to the giant cluster formation 
threshold in the movie - movie network, has vote density 
£ « 2iV~ 1/2 M" 1/6 [!(}, where N is the number of users, M 
is the number of movies. 



lens.umn.edu; grouplens.org) ratings are recorded on a 
five stars scale and contain additional information, such 
as the time at which an evaluation was made. The data 
set we downloaded contains N = 6040 users x M = 3952 
movies. However, only a fraction £m = 0.041 of all possi- 
ble votes were actually expressed. To be able to perform 
the calculation in reasonable time, we decide to further 
reduce the data size in each dimension by roughly 50%. 
To preserve the statistical properties of the original data, 
the pruning is done randomly without bias. In particu- 
lar, we tried to maintain the probability distribution of 
the number of votes per users, as well as the sparsity and 
the N/M ratio. We want to stress that this is crucial 
when testing the performance of predictive algorithms 
on real data in an objective way. In fact, many recom- 
mender systems can be found in the literature that rely 
on dense voting matrices [1, 0, at least in the traning 
data set. Typically, users who have judged too few items 
are struck out, as well as items that have received too 
few votes. We did not comply to such convention and 
made an effort to keep the filtering level as low as possi- 
ble, although this makes predictions much more difficult. 

Once filtered, we cast the data set in a vote matrix V, 
with number of users N = 3020 and number of movies 
M = 1976. In this reduced vote matrix, the matrix el- 
ement V a j represents the number of stars assigned to 
movie j by user a and is set to zero for unexpressed 
votes. The total filling fraction of V is £m = 0.0468. 
The votes in V are then sorted according to their relative 
timestamps. The last nt cs t = 10 4 expressed votes are col- 
lected to form our test set, while the rest of the expressed 
votes form our training set. We denote by V(i) the vote 
matrix information up to time t. That is, in ~V(t) all the 
unexpressed votes up to time t are set to have zero star. 
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For the purpose of rating prediction, one will need a 
movie - movie network. To accomplish this task, one may 
compute the correlation coefficient Cij (t) between movie 
i and movie j using the expressed votes up to a certain 
time t in the training set. Specifically, we denote /J>i(t) = 

lEL^'ffl ^d «r?(f) ee iE«=ilV a ,(t) ~ Hi{t)f. 
The correlation coefficient reads 



Cn{t) 



ai(t)<7j(t) 



(12) 



With a specified cutoff C cu t, one obtains an adjacency 
matrix A(t), with A tj {t) = 9{C tj (t) - C cnt (t)). The value 
of C cu t(i) is set so that the average degree per node k(t) 
for the movie - movie network has the same number of 
non-zero entries as [V(i)] T [V(i)]. 

Keeping the test set data fixed, we progressively fill 
the vote matrix the training set data over time (using 
the relative time stamps), say up to time t. We then 
use A(t) to construct the the propagator D{t) based on 
the information accumulated up to t. For each viewer 
(user), the BC is simply given by the votes expressed by 
the user up to time t. In the event that a user only has 
one vote (or none) up to time t, the BC for that user is 
given by randomly choosing one (or two) movie(s) and 
use the average rating(s) of the movie(s) up to that time 
as the boundary values [ll[. We then use our algorithm 
to make predictions on the entire test set. 

This test protocol is intended to reproduce real ap- 
plication tasks, where one aims to predict future votes 
-which is, of course, much harder than predicting ran- 
domly picked evaluations. It is somewhat less realistic to 
fix the test set once and for all, but this has the advantage 
to allow for more objective comparisons of the results. 
Many different accuracy metrics have been pro pos ed to 
assess the quality of recommendations (see ref. [12|), we 
choose the Root Square Mean Error: 



RSME = 



(/3,j)etest 



Vf3j) 2 /ntcst, 



(13) 



where Vp ■ represents the predicted vote from our algo- 
rithm, V/3j represents the actual vote (rated by user (3 
on movie j) in the test set, and the sum runs over all 
expressed votes in the test set. In our experiments, the 
RSME is calculated, at different sparsity values £, on a 
unique test set. 

Fig. [2] summarizes the performance comparison of our 
model with the mean predictor (the prediction is sim- 
ply given by the objects mean value) and the widely 
used Pearson correlation based method [HI, [l4j|. Our 
model outperforms both after enough votes (of the order 
of 7V 1//2 M 5 / 6 ) have been expressed. Since the dimensions 
of the vote matrix V is known in a real application, given 
the number of expressed votes, it is relatively easy to see 
where one stands in terms of information content and 
whether our method will perform well using the given 
partial information. 

In summary, we have devised a recommendation mech- 
anism using analog to heat conduction. The innovation 
of our method is its capability to compute the Green's 
function needed just once to accommodate all possible 
BC. In terms of generalization, it is apparent that our 
method can be applied to network with weighted edges, 
with Ajj = Wij > 0. Whether such a generalization will 
improve the performance will be investigated in a sepa- 
rate publication. Finally, we stress that our study is not 
aimed to extract statistical properties out of networks 
through constructin g m odel networks mimicking the real 
world networks [IH lid]: nor are we pursuing analysis of 
slowly decaying eigenmodes [TtJ in the absence of bound- 
ary condtitions. Instead, our goal is to provide a frame- 
work that is capable of providing individualized informa- 
tion extraction from a real world network. 
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